Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armv7-M: Allow register overlap in ldm + ldrd #153

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SH1E0r1r2y
Copy link

Fixed the splitting of ldrd and ldm when the address register and output register overlap in ldrd_imm_splitting_cb and ldm_interval_splitting_cb.

Copy link
Collaborator

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Can you please add a new example to test that this works.
A simple

ldrd r0, r1, [r0]
ldm r0, {r0-r3}

should do.

slothy/targets/arm_v7m/arch_v7m.py Outdated Show resolved Hide resolved
slothy/targets/arm_v7m/arch_v7m.py Outdated Show resolved Hide resolved
@mkannwischer
Copy link
Collaborator

Thanks!

Can you please add a new example to test that this works. A simple

ldrd r0, r1, [r0]
ldm r0, {r0-r3}

should do.

Or simply extend armv7m_simple0.s

@mkannwischer mkannwischer changed the title fixed the register overlap Armv7-M: Allow register overlap in ldm + ldrd Jan 9, 2025
Copy link
Collaborator

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes. Almost there.

examples/naive/armv7m/armv7m_simple0.s Outdated Show resolved Hide resolved
examples/opt/armv7m/armv7m_simple0_opt_m7.s Outdated Show resolved Hide resolved
example.py Show resolved Hide resolved
@@ -1486,7 +1486,7 @@ def make(cls, src):
obj.increment = None
obj.pre_index = 0
obj.addr = obj.args_in[0]
obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra
#obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove those, not comment them out.
Also we need to test if this affects any other examples in SLOTHY.

For that please make sure you have a clean copy of SLOTHY, and then run

python3 example.py --timeout 60 --only-target=slothy.targets.arm_v7m.cortex_m7

This is going to run for a few hours. Then zip up the output files in examples/opt/armv7m and attach them to this PR.

@mkannwischer mkannwischer force-pushed the overlap branch 4 times, most recently from 0e257f4 to 8871294 Compare February 4, 2025 08:39
Previously ldm and ldrd fusion would break if
the same register is used as address as one of
the outputs (and it's not the last output).
This commit fixes that by changing the fusion to
re-order the ldr overwriting the address to the
very end in case there is an overlap.

Note that this is not needed for stm/strd as
there you cannot have an overlap.

Additionally, it removes unnecessary restrictions
disallowing Rd=Ra for ldrb/ldrh/ldr.
@mkannwischer
Copy link
Collaborator

I cleaned this up, but I still need to run a full test with something like

python3 example.py --timeout 300 --only-target=slothy.targets.arm_v7m.cortex_m7

(shorter timeout does not work for the dilithium ntt).
Going to do that tomorrow. Converting to draft for now.

@mkannwischer mkannwischer marked this pull request as draft February 4, 2025 09:57
@mkannwischer
Copy link
Collaborator

I re-ran with larger timeout (300 is still not enough for the dilithiume ntt, but 600 seems fine on my machine).

Unfortunately, fnt_257_dilithium_m7 fails the selftest:

ERROR:fnt_257_dilithium_m7._fnt_3_4_5_6.split.122_173.selftest:Output registers:
ERROR:fnt_257_dilithium_m7._fnt_3_4_5_6.split.122_173.selftest:{'s27', 'r2', 's11', 's10', 's13', 's25', 's0', 's26', 's21', 's23', 's24', 'r1', 's5', 's19', 's20', 's22', 's9', 's29', 's1', 's3', 's4', 's12', 's17', 's6', 's15', 's28', 'r11', 's7', 'r9', 'r4', 's8', 's16', 'r3', 'r10', 's2', 'r12', 's18', 'r0', 's14', 'r13'}
Traceback (most recent call last):
  File "/Users/matthiaskannwischer/git/slothy/example.py", line 2953, in <module>
    main()
  File "/Users/matthiaskannwischer/git/slothy/example.py", line 2946, in main
    run_example(e, debug=args.debug, dry_run=args.dry_run,
  File "/Users/matthiaskannwischer/git/slothy/example.py", line 2942, in run_example
    ex.run(**kwargs)
  File "/Users/matthiaskannwischer/git/slothy/example.py", line 165, in run
    self.core(slothy, *self.extra_args)
  File "/Users/matthiaskannwischer/git/slothy/example.py", line 1711, in core
    slothy.optimize_loop("_fnt_3_4_5_6")
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/slothy.py", line 511, in optimize_loop
    Heuristics.periodic(body, logger, c)
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 325, in periodic
    res = Heuristics.linear( body, logger=logger, conf=conf)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 404, in linear
    return Heuristics._split(body, logger, conf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 826, in _split
    return Heuristics._split_inner(body, logger, c)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 772, in _split_inner
    cur_body, stalls, local_perm = optimize_chunks_many(idx_lst, cur_body, stalls,
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 699, in optimize_chunks_many
    body, stalls, cur_stalls, local_perm = optimize_chunk(start_idx, end_idx, body,
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 675, in optimize_chunk
    result = Heuristics.optimize_binsearch(cur_body,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 134, in optimize_binsearch
    return Heuristics.optimize_binsearch_internal(source, logger, conf,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/heuristics.py", line 248, in optimize_binsearch_internal
    success = core.optimize(source, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/core.py", line 1570, in optimize
    self._extract_result()
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/core.py", line 1963, in _extract_result
    self._result.selftest(self.logger.getChild("selftest"))
  File "/Users/matthiaskannwischer/git/slothy/slothy/core/core.py", line 906, in selftest
    SelfTest.run(self.config, log, old_source, new_source, address_registers, regs_expected,
  File "/Users/matthiaskannwischer/git/slothy/slothy/helper.py", line 1396, in run
    raise SelfTestException(f"Selftest failed: Register mismatch for {r}: {hex(final_regs_old[r])} != {hex(final_regs_new[r])}")
slothy.helper.SelfTestException: Selftest failed: Register mismatch for r1: 0xe801bbf1 != 0xeebcff00

This need investigation before we can merge this. @dop-amin any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants